%load_ext pretty_jupyter
Introduction¶
In diesem Notebook wenden wir Applied Machine Learning (AML) Techniken an, um effektive Strategien für personalisierte Kreditkarten-Werbekampagnen zu entwickeln. Unser Ziel ist es, mithilfe von Kunden- und Transaktionsdaten präzise Modelle zu erstellen, die die Wahrscheinlichkeit des Kreditkartenkaufs vorhersagen.
Lib Importing¶
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
from itables import init_notebook_mode
from datetime import datetime
init_notebook_mode(all_interactive=True)
Load the Data¶
account = pd.read_csv("account.csv", sep=";")
card = pd.read_csv("card.csv", sep=";")
client = pd.read_csv("client.csv", sep=";")
disp = pd.read_csv("disp.csv", sep=";")
district = pd.read_csv("district.csv", sep=";")
loan = pd.read_csv("loan.csv", sep=";")
order = pd.read_csv("order.csv", sep=";")
trans = pd.read_csv("trans.csv", sep=";", low_memory=False)
EDA¶
Account¶
account
| account_id | district_id | frequency | date |
|---|---|---|---|
| Loading... (need help?) |
Card¶
card
| card_id | disp_id | type | issued |
|---|---|---|---|
| Loading... (need help?) |
Client¶
client
| client_id | birth_number | district_id |
|---|---|---|
| Loading... (need help?) |
Disp¶
disp
| disp_id | client_id | account_id | type |
|---|---|---|---|
| Loading... (need help?) |
District¶
district
| A1 | A2 | A3 | A4 | A5 | A6 | A7 | A8 | A9 | A10 | A11 | A12 | A13 | A14 | A15 | A16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
Loan¶
loan
| loan_id | account_id | date | amount | duration | payments | status |
|---|---|---|---|---|---|---|
| Loading... (need help?) |
Order¶
order
| order_id | account_id | bank_to | account_to | amount | k_symbol |
|---|---|---|---|---|---|
| Loading... (need help?) |
Trans¶
trans
| trans_id | account_id | date | type | operation | amount | balance | k_symbol | bank | account |
|---|---|---|---|---|---|---|---|---|---|
| Loading... (need help?) |
Transformations¶
data_frames = {}
Account¶
# Frequency Transformation
account["frequency"] = account["frequency"].replace(
{
"POPLATEK MESICNE": "MONTHLY ISSUANCE",
"POPLATEK TYDNE": "WEEKLY ISSUANCE",
"POPLATEK PO OBRATU": "ISSUANCE AFTER TRANSACTION",
}
)
# Rename Column
account = account.rename(columns={"frequency": "issuance_statement_frequency"})
# Convert Date Column to datetime format
account["date"] = pd.to_datetime(account["date"])
# Assuming 'data_frames' is a dictionary of DataFrames
data_frames["account.csv"] = account
# Sample 5 random rows
account.sample(n=5)
| account_id | district_id | issuance_statement_frequency | date | |
|---|---|---|---|---|
| Loading... (need help?) |
Card¶
card["issued"] = pd.to_datetime(card["issued"], format="mixed")
data_frames["card.csv"] = card
Client¶
# Funktion zur Bestimmung des Geschlechts und Berechnung des Geburtstags
def parse_details(birth_number):
birth_number_str = str(
birth_number
) # Konvertiere birth_number zu einem String, falls notwendig
year_prefix = "19"
month = int(birth_number_str[2:4])
gender = "female" if month > 12 else "male"
if gender == "female":
month -= 50
year = int(year_prefix + birth_number_str[:2])
day = int(birth_number_str[4:6])
birth_day = datetime(year, month, day)
return gender, birth_day
# Berechnung des Alters basierend auf einem Basisjahr
def calculate_age(birth_date, base_date=datetime(1999, 12, 31)):
return (
base_date.year
- birth_date.year
- ((base_date.month, base_date.day) < (birth_date.month, birth_date.day))
)
# Anwenden der Funktionen und Erstellen neuer Spalten
client["gender"], client["birth_day"] = zip(
*client["birth_number"].apply(parse_details)
)
client["age"] = client["birth_day"].apply(calculate_age)
# Auswahl spezifischer Spalten für die finale DataFrame (optional, je nach Bedarf)
client = client[["client_id", "district_id", "gender", "birth_day", "age"]]
# Sample 5 random rows
client.sample(n=5)
| client_id | district_id | gender | birth_day | age | |
|---|---|---|---|---|---|
| Loading... (need help?) |
Disp¶